Iterative Linear Association Analysis (ILAA) is a computational method that creates a linear transformation from a sample of multidimensional data that effectively removes linear associations between data variables. The returned transformation matrix can be used to:
Do an exploratory analysis of latent variables and their association to all the observed variables
Do exploratory discovery of latent variables associated with an specific outcome-target
Addressing multicollinearity issues in linear regression models
Better estimation and interpretation of model variables
Improve linear model performance
Simplify the multidimensional search space for many ML algorithms
The objective of this tutorial is to guide users in using the ILAA to effectively accomplish the aforementioned tasks. The tutorial will showcase:
Transform a data frame affected by data multicollinearity into a new a data frame with minimum data correlation among variables
Visualize the transformation matrix
Explore the returned formulas for each one of the returned latent variables
Understand and interpret the returned latent variables
Use ILAA as a pre-processing step to model a specific target outcome using linear models
ILAA is a wrapper of the more general method of data decorrelation algorithm (IDeA) implemented in R, and both are part of the FRESA.CAD 3.4.6 package.
## From git hub
#First install package devtools
#library(devtools)
#install_github("joseTamezPena/FRESA.CAD")
## For ILAA
library("FRESA.CAD")
## For network analysis
library(igraph)
## For multicollinearity
library(multiColl)
library(car)
For this tutorial I’ll use the body-fat prediction data set. The data was downloaded from Kaggle:
https://www.kaggle.com/datasets/fedesoriano/body-fat-prediction-dataset
The Kaggle data disclaimer:
“Source The data were generously supplied by Dr. A. Garth Fisher who gave permission to freely distribute the data and use for non-commercial purposes.
Roger W. Johnson Department of Mathematics & Computer Science South Dakota School of Mines & Technology 501 East St. Joseph Street Rapid City, SD 57701
email address: rwjohnso@silver.sdsmt.edu web address: http://silver.sdsmt.edu/~rwjohnso”
The following code snippet loads the data and removes the density information from the data. It also computes the Body Mass Index (BMI)
body_fat <- read.csv("~/GitHub/LatentBiomarkers/Data/BodyFat/BodyFat.csv", header=TRUE)
### Removing density as estimator
body_fat$Density <- NULL
body_fat$BMI <- 10000*body_fat$Weight*0.453592/((body_fat$Height*2.54)^2)
## Removing subjects with data errors
body_fat <- body_fat[body_fat$BMI<=50,]
The ILAA function is:
decorrelatedData <- ILAA(data=NULL,
thr=0.80,
method=c("pearson","spearman"),
Outcome=NULL,
drivingFeatures=NULL,
maxLoops=100,
verbose=FALSE,
bootstrap=0
)
where:
data: The source data-frame
thr : The target correlation goal.
method : Defines the correlation
measure
Outcome The name of the target variable, and it is
required for supervised learning
drivingFeatures : Defines a set of variables that
are aimed to be basis unaltered vectors
maxLoops : The maximum number of iterations
cycles
verbose : Display the evolution of the
algorithm.
bootstrap : The number of bootstrap
estimations.
To help user taking advantage of the ILLA transformed object. FRESA.CAD provide the following auxiliary functions:
newTransformedData <- predictDecorrelate(decorrelatedData,NewData)
theBetaCoefficientts <- getLatentCoefficients(decorrelatedData)
fromLatenttoObserved <- getObservedCoef(decorrelatedData,latentModel)
predictDecorrelate() Rotates any new data set based
on the output of an ILAA transformed data set.
getLatentCoefficients() Returns a list of all the
beta coefficients for each one of the discovered latent
variables.
getObservedCoef() returns the beta coefficients on
the observed space of any linear model that was trained on the UPLTM
space.
By default, the ILAA function will target a correlation lower than 0.8 using the Pearson correlation measure. But user has the freedom to chose between robust fitting with Spearman correlation measure, and/or set the level of feature association by lowering the threshold. The following snippet shows the different options.
# Default call
body_fat_Decorrelated <- ILAA(body_fat)
pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
| BodyFat | Age | Ankle | Forearm | Wrist | Weight | La_Biceps | La_Neck |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 1 | 0.359 | 0.303 |
| La_Knee | La_Thigh | La_BMI | La_Chest | La_Abdomen | La_Hip | La_Height |
|---|---|---|---|---|---|---|
| 0.272 | 0.242 | 0.211 | 0.171 | 0.15 | 0.11 | 0.0209 |
# Explore the convergence metrics in verbose mode
body_fat_Decorrelated <- ILAA(body_fat,verbose=TRUE)
fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333
Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743
1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888
2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860
3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959
4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735
5 <R=0.735,thr=0.800>
[ 5 ], 0.4782625 Decor Dimension: 10 Nused: 10 . Cor to Base: 7 , ABase: 15 , Outcome Base: 0
pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
| BodyFat | Age | Ankle | Forearm | Wrist | Weight | La_Biceps | La_Neck |
|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 1 | 0.359 | 0.303 |
| La_Knee | La_Thigh | La_BMI | La_Chest | La_Abdomen | La_Hip | La_Height |
|---|---|---|---|---|---|---|
| 0.272 | 0.242 | 0.211 | 0.171 | 0.15 | 0.11 | 0.0209 |
# Robust Linear Fitting with spearman correlation measure
body_fat_Decorrelated <- ILAA(body_fat,method="spearman",verbose=TRUE)
spearman | RLM | Weight BodyFat Age Weight Height Neck Chest 0.46666667 0.06666667 1.00000000 0.13333333 0.53333333 0.80000000
Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743
1 <R=0.929,thr=0.900>, Top: 2< 1 >Fa= 2,<><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.872
2 <R=0.872,thr=0.800>, Top: 1< 4 >Fa= 2,<><>Tot Used: 8 , Added: 4 , Zero Std: 0 , Max Cor: 0.837
3 <R=0.837,thr=0.800>, Top: 1< 1 >Fa= 2,<><>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.990
4 <R=0.990,thr=0.950>, Top: 1< 1 >Fa= 2,<><>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.781
5 <R=0.781,thr=0.800>
[ 5 ], 0.4849516 Decor Dimension: 9 Nused: 9 . Cor to Base: 6 , ABase: 15 , Outcome Base: 0
pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
| BodyFat | Age | Ankle | Biceps | Forearm | Wrist | Weight | La_Neck | La_Knee |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.303 | 0.273 |
| La_Thigh | La_BMI | La_Chest | La_Abdomen | La_Hip | La_Height |
|---|---|---|---|---|---|
| 0.242 | 0.212 | 0.174 | 0.15 | 0.11 | 0.0221 |
# Lowering the threshold
body_fat_Decorrelated <- ILAA(body_fat,thr=0.4,verbose=TRUE)
fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333
Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743
1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888
2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860
3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959
4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735
5 <R=0.735,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.631
6 <R=0.631,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.501
7 <R=0.501,thr=0.500>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.584
8 <R=0.584,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.478
9 <R=0.478,thr=0.400>, Top: 2< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.421
10 <R=0.421,thr=0.400>, Top: 1< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.375
11 <R=0.375,thr=0.400>
[ 11 ], 0.3726062 Decor Dimension: 14 Nused: 14 . Cor to Base: 12 , ABase: 15 , Outcome Base: 0
pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
| Age | Weight | La_Ankle | La_Forearm | La_Wrist | La_Biceps | La_BodyFat |
|---|---|---|---|---|---|---|
| 1 | 1 | 0.624 | 0.602 | 0.459 | 0.359 | 0.309 |
| La_Neck | La_Knee | La_BMI | La_Thigh | La_Abdomen | La_Hip | La_Chest | La_Height |
|---|---|---|---|---|---|---|---|
| 0.303 | 0.272 | 0.211 | 0.194 | 0.15 | 0.11 | 0.108 | 0.0209 |
# Tring to achive the maximum independence beteeen variables, i.e., thr=0.0
body_fat_Decorrelated <- ILAA(body_fat,thr=0.0,verbose=TRUE)
fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333
Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743
1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888
2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860
3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959
4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735
5 <R=0.735,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.631
6 <R=0.631,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.501
7 <R=0.501,thr=0.500>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.584
8 <R=0.584,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.478
9 <R=0.478,thr=0.400>, Top: 2< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.421
10 <R=0.421,thr=0.400>, Top: 1< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.375
11 <R=0.375,thr=0.300>, Top: 4< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.384
12 <R=0.384,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.343
13 <R=0.343,thr=0.300>, Top: 1< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.335
14 <R=0.335,thr=0.300>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.291
15 <R=0.291,thr=0.200>, Top: 4< 2 >Fa= 8,<|><>Tot Used: 15 , Added: 7 , Zero Std: 0 , Max Cor: 0.252
16 <R=0.252,thr=0.200>, Top: 3< 3 >Fa= 8,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.207
17 <R=0.207,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.217
18 <R=0.217,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.221
19 <R=0.221,thr=0.200>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.189
20 <R=0.189,thr=0.147>, Top: 5< 2 >Fa= 9,<|><>Tot Used: 15 , Added: 4 , Zero Std: 0 , Max Cor: 0.157
21 <R=0.157,thr=0.147>, Top: 5< 1 >Fa= 9,<><>Tot Used: 15 , Added: 0 , Zero Std: 0 , Max Cor: 0.157
[ 21 ], 0.1572866 Decor Dimension: 15 Nused: 15 . Cor to Base: 14 , ABase: 15 , Outcome Base: 0
pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
| Weight | La_Ankle | La_Forearm | La_Age | La_Wrist | La_Biceps | La_BodyFat |
|---|---|---|---|---|---|---|
| 1 | 0.54 | 0.518 | 0.483 | 0.403 | 0.32 | 0.29 |
| La_Neck | La_Knee | La_BMI | La_Thigh | La_Abdomen | La_Chest | La_Hip | La_Height |
|---|---|---|---|---|---|---|---|
| 0.269 | 0.228 | 0.211 | 0.183 | 0.128 | 0.101 | 0.0993 | 0.0209 |
For the rest of the tutorial I’ll set the correlation goal to 0.2 in verbose mode.
# Calling ILAA to achieve a final correlation of 0.2
body_fat_Decorrelated <- ILAA(body_fat,thr=0.2,verbose=TRUE)
fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333
Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743
1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888
2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860
3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959
4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735
5 <R=0.735,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.631
6 <R=0.631,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.501
7 <R=0.501,thr=0.500>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.584
8 <R=0.584,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.478
9 <R=0.478,thr=0.400>, Top: 2< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.421
10 <R=0.421,thr=0.400>, Top: 1< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.375
11 <R=0.375,thr=0.300>, Top: 4< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.384
12 <R=0.384,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.343
13 <R=0.343,thr=0.300>, Top: 1< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.335
14 <R=0.335,thr=0.300>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.291
15 <R=0.291,thr=0.200>, Top: 4< 2 >Fa= 8,<|><>Tot Used: 15 , Added: 7 , Zero Std: 0 , Max Cor: 0.252
16 <R=0.252,thr=0.200>, Top: 3< 3 >Fa= 8,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.207
17 <R=0.207,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.217
18 <R=0.217,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.221
19 <R=0.221,thr=0.200>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.189
20 <R=0.189,thr=0.200>
[ 20 ], 0.188603 Decor Dimension: 15 Nused: 15 . Cor to Base: 14 , ABase: 15 , Outcome Base: 0
pander::pander(attr(body_fat_Decorrelated,"VarRatio"))
| Weight | La_Ankle | La_Forearm | La_Age | La_Wrist | La_Biceps | La_BodyFat |
|---|---|---|---|---|---|---|
| 1 | 0.555 | 0.518 | 0.483 | 0.403 | 0.32 | 0.29 |
| La_Neck | La_Knee | La_BMI | La_Thigh | La_Abdomen | La_Chest | La_Hip | La_Height |
|---|---|---|---|---|---|---|---|
| 0.279 | 0.228 | 0.211 | 0.183 | 0.132 | 0.104 | 0.0993 | 0.0209 |
The returned data matrix contains the following attributes
attr(body_fat_Decorrelated,"UPLTM") #The transformation matrix
attr(body_fat_Decorrelated,"fscore") #The score of each feature
attr(body_fat_Decorrelated,"drivingFeatures") #The list of driving features
attr(body_fat_Decorrelated,"unaltered") #The list of unaltered features
attr(body_fat_Decorrelated,"LatentVariables") #The list of latent variables
attr(body_fat_Decorrelated,"R.critical") #The estimated minimum achieviable correlation
attr(body_fat_Decorrelated,"IDeAEvolution") #Evolution of the algorithm
attr(body_fat_Decorrelated,"VarRatio") #Variance Ratios: var(Latent)/Var(obs)
The main attributes is “UPLTM”. That stores the specific
linear transformation matrix from observed variables to the latent
variable.
The next relevant attribute is the “VarRatio", this
attributive stores the fraction of the original feature variance that is
still present in the latent variable. All non-altered variables return
a”VarRatio” of 1.
The “IDeAEvolution” attribute can be used to verify if
the algorithm achieved the target correlation goal, and the sparsity of
the returned matrix.
Here we will use the
attr(dataTransformed,"IDeAEvolution") to plot the evolution
of the correlation measure and the evolution of the matrix sparsity.
par(mfrow=c(1,2),cex=0.5)
# Correlation
yval <- attr(body_fat_Decorrelated,"IDeAEvolution")$Corr
xidx <- c(1:length(yval))
plot(xidx,yval,
xlab="Iteration Cycle",
ylab="Max. Pearson Correlation",
ylim=c(0,1.0),
main="Evolution of the maximum Correlation")
lfit <-try(loess(yval~xidx,span=0.5));
if (!inherits(lfit,"try-error"))
{
plx <- try(predict(lfit,se=TRUE))
if (!inherits(plx,"try-error"))
{
lines(xidx,plx$fit,lty=1,col="red")
}
}
# Sparsity
yval <- attr(body_fat_Decorrelated,"IDeAEvolution")$Spar
plot(xidx,yval,
xlab="Iteration Cycle",
ylab="Matrix Sparcity",
ylim=c(0,1.0),
main="Evolution of the Matrix Sparcity")
lfit <-try(loess(yval~xidx,span=0.5));
if (!inherits(lfit,"try-error"))
{
plx <- try(predict(lfit,se=TRUE))
if (!inherits(plx,"try-error"))
{
lines(xidx,plx$fit,lty=1,col="red")
}
}
Before exploring into more detail, the properties of the
ILAA results. Let us first verify that the returned matrix
does not contain features with very high correlation among them.
Here I’ll plot the original correlation and the correlation of the returned data set.
# The original
par(cex=0.6,cex.main=0.85,cex.axis=0.7)
cormat <- cor(body_fat,method="pearson")
gplots::heatmap.2(abs(cormat),
trace = "none",
mar = c(5,5),
col=rev(heat.colors(11)),
main = "Original Correlation",
cexRow = 0.75,
cexCol = 0.75,
srtCol=30,
srtRow=60,
key.title=NA,
key.xlab="|Pearson Correlation|",
xlab="Feature", ylab="Feature")
# The transformed
cormat <- cor(body_fat_Decorrelated,method="pearson")
gplots::heatmap.2(abs(cormat),
trace = "none",
mar = c(5,5),
col=rev(heat.colors(11)),
main = "Correlation After ILAA",
cexRow = 0.75,
cexCol = 0.75,
srtCol=30,
srtRow=60,
key.title=NA,
key.xlab="|Pearson Correlation|",
xlab="Feature", ylab="Feature")
The attr(body_fat_Decorrelated,"UPLTM") returns the
transformation matrix. The UPLTM is sparse, here I show a
heat map of the transformation matrix that shows which elements are
different from zero.
UPLTM <- attr(body_fat_Decorrelated,"UPLTM")
gplots::heatmap.2(1.0*(abs(UPLTM)>0),
trace = "none",
mar = c(5,5),
col=rev(heat.colors(2)),
Rowv=NULL,
Colv="Rowv",
dendrogram="none",
main = "Transformation matrix",
cexRow = 0.75,
cexCol = 0.75,
srtCol=30,
srtRow=60,
key.title=NA,
key.xlab="|Beta|>0",
xlab="Output Feature", ylab="Input Feature")
The sparsity of the UPLTM matrix can be analyzed to get
the formula for each one of the latent formulas. The
getLatentCoefficients() and its attribute:
attr(LatentFormulas,"LatentCharFormulas") can be used to
display the formula of the latent variables.
# Get a list with the latent formulas' coefficients
LatentFormulas <- getLatentCoefficients(body_fat_Decorrelated)
# A string character with the formulas can be obtained by:
charFormulas <- attr(LatentFormulas,"LatentCharFormulas")
pander::pander(as.matrix(charFormulas))
| La_BodyFat | + BodyFat + (0.120)Weight - (0.800)Abdomen - (0.480)BMI |
| La_Age | + Age + (0.363)Weight - (0.636)Neck - (1.117)Abdomen - (8.09e-04)Hip + (2.273)Thigh - (1.732)Knee - (5.032)Wrist - (0.864)BMI |
| La_Height | - (0.191)Weight + Height + (1.339)BMI |
| La_Neck | - (0.100)Weight + Neck + (0.172)Hip - (0.074)BMI |
| La_Chest | - (0.140)Weight + Chest - (0.363)Abdomen + (0.419)Hip + (0.265)Thigh - (1.082)BMI |
| La_Abdomen | - (0.094)Weight + Abdomen - (1.865)BMI |
| La_Hip | - (0.181)Weight + Hip - (0.430)BMI |
| La_Thigh | - (0.056)Weight + (0.137)Abdomen - (0.489)Hip + Thigh - (0.256)BMI |
| La_Knee | - (0.056)Weight + (0.067)Neck - (0.017)Abdomen - (0.046)Hip - (0.121)Thigh + Knee - (0.406)Wrist + (0.229)BMI |
| La_Ankle | - (0.035)Weight + (0.098)Neck + (0.069)Abdomen + Ankle - (0.594)Wrist - (0.128)BMI |
| La_Biceps | - (0.081)Weight + (0.075)Abdomen + (0.098)Hip - (0.200)Thigh + Biceps - (0.140)BMI |
| La_Forearm | - (0.017)Weight - (0.323)Biceps + Forearm |
| La_Wrist | - (0.012)Weight - (0.165)Neck + Wrist |
| La_BMI | - (0.111)Weight + BMI |
The ILAA returns the Unit Preserving Linear Transformation Matrix (UPLTM). This specific transformation is the combination of statistically significant linear association analysis between feature pairs. Each significant association is modeled by a linear equation; henceforth, the interpretation of each feature is as follows:
Each discovered latent variable is the residual of the observed parent variable vs. the suitable model of the variables associated with the parent variable. For example: \[ LaWrist= Wrist - 0.012Weight - 0.165Neck. \]
Describes that the \(Wrist\) is associated with the \(Weight\) and \(Neck\). The latent variable \(LaWrist\) is the amount of information in the \(Wrist\) not found by \(Weight\) nor the \(Neck\).
The model of the \(Wrist\) is therefore:
\[ Wrist = +0.012Weight + 0.165Neck. \]
The graph_from_adjacency_matrix() function from
igraph can be used to visualize the association between
variables.
par(op)
transform <- attr(body_fat_Decorrelated,"UPLTM") != 0
colnames(transform) <- str_remove_all(colnames(transform),"La_")
transform <- abs(transform*cor(body_fat[,rownames(transform)])) # The weights are proportional to the observed correlation
VertexSize <- attr(body_fat_Decorrelated,"fscore") # The size depends on the variable independence relevance (fscore)
names(VertexSize) <- str_remove_all(names(VertexSize),"La_")
VertexSize <- 10*(VertexSize-min(VertexSize))/(max(VertexSize)-min(VertexSize)) # Normalization
gr <- graph_from_adjacency_matrix(transform,mode = "directed",diag = FALSE,weighted=TRUE)
gr$layout <- layout_with_fr
fc <- cluster_optimal(gr)
plot(fc, gr,
edge.width=2*E(gr)$weight,
edge.arrow.size=0.5,
edge.arrow.width=0.5,
vertex.size=VertexSize,
vertex.label.cex=0.85,
vertex.label.dist=2,
main="Feature Association")
par(op)
I’ll generate 100 solutions of the UPLTM and aggregate the non-zero coefficients. Then, I’ll plot the heat map of the frequency of hits
par(op)
dsize <- nrow(body_fat);
taccmatrix <- cor(body_fat)*0;
for (lp in c(1:100))
{
dmat <- ILAA(body_fat[sample(dsize,dsize,replace = TRUE),],thr=0.2)
transform <- attr(dmat,"UPLTM") != 0
colnames(transform) <- str_remove_all(colnames(transform),"La_")
taccmatrix[,colnames(transform)] <- taccmatrix[,colnames(transform)] + transform
}
gplots::heatmap.2(taccmatrix,
trace = "none",
mar = c(5,5),
Rowv=NULL,
Colv="Rowv",
dendrogram="none",
col=rev(heat.colors(11)),
main = "Transform Hits",
cexRow = 0.75,
cexCol = 0.75,
srtCol=30,
srtRow=60,
key.title=NA,
key.xlab="|Beta|>0",
xlab="Output Feature", ylab="Input Feature")
par(op)
To handle data sensitivity, ILAA allows for bootstrapping estimation of the transformation matrix.
body_fat_Decorrelated <- ILAA(body_fat,thr=0.2,verbose=TRUE,bootstrap=100)
fast | LM | Weight BodyFat Age Weight Height Neck Chest 0.40000000 0.06666667 1.00000000 0.13333333 0.53333333 0.73333333
Included: 15 , Uni p: 0.01 , Base Size: 1 , Rcrit: 0.1467743
1 <R=0.944,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.888
2 <R=0.888,thr=0.800>, Top: 1< 5 >Fa= 2,<|><>Tot Used: 9 , Added: 5 , Zero Std: 0 , Max Cor: 0.860
3 <R=0.860,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.959
4 <R=0.959,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.735
5 <R=0.735,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 11 , Added: 1 , Zero Std: 0 , Max Cor: 0.631
6 <R=0.631,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.501
7 <R=0.501,thr=0.500>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.584
8 <R=0.584,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.478
9 <R=0.478,thr=0.400>, Top: 2< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.421
10 <R=0.421,thr=0.400>, Top: 1< 1 >Fa= 4,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.375
11 <R=0.375,thr=0.300>, Top: 4< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.384
12 <R=0.384,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.343
13 <R=0.343,thr=0.300>, Top: 1< 1 >Fa= 7,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.335
14 <R=0.335,thr=0.300>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.291
15 <R=0.291,thr=0.200>, Top: 4< 2 >Fa= 8,<|><>Tot Used: 15 , Added: 7 , Zero Std: 0 , Max Cor: 0.252
16 <R=0.252,thr=0.200>, Top: 3< 3 >Fa= 8,<|><>Tot Used: 15 , Added: 5 , Zero Std: 0 , Max Cor: 0.207
17 <R=0.207,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.217
18 <R=0.217,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.221
19 <R=0.221,thr=0.200>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 15 , Added: 1 , Zero Std: 0 , Max Cor: 0.189
20 <R=0.189,thr=0.200>
[ 20 ], 0.188603 Decor Dimension: 15 Nused: 15 . Cor to Base: 14 , ABase: 15 , Outcome Base: 0
bootstrapping .(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00). (r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.18,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00). (r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.20,w=1.00).(r=0.19,w=1.00).(r=0.19,w=1.00)
Weight La_Ankle La_Forearm La_Age La_Wrist La_Biceps
1.0000000 0.5546475 0.5151418 0.4844892 0.4024186 0.3196694
attr(body_fat_Decorrelated,"VarRatio")
Weight La_Ankle La_Forearm La_Age La_Wrist La_Biceps La_BodyFat
1.00000000 0.55464748 0.51514177 0.48448918 0.40241858 0.31966937 0.29053758 La_Neck La_Knee La_BMI La_Thigh La_Abdomen La_Chest La_Hip 0.27472674 0.23679663 0.21084374 0.18090573 0.13023753 0.10269101 0.09925260 La_Height 0.02073104
## Getting the formulas
LatentFormulas <- getLatentCoefficients(body_fat_Decorrelated)
charFormulas <- attr(LatentFormulas,"LatentCharFormulas")
pander::pander(as.matrix(charFormulas))
| La_BodyFat | + BodyFat + (0.118)Weight - (2.50e-03)Neck - (0.797)Abdomen + (0.015)Wrist - (0.481)BMI |
| La_Age | + Age + (0.280)Weight + (0.107)Height - (0.157)Neck + (2.78e-03)Chest - (1.150)Abdomen + (0.025)Hip + (2.092)Thigh - (1.082)Knee - (0.031)Biceps + (0.014)Forearm - (5.499)Wrist - (0.465)BMI |
| La_Height | - (1.67e-04)BodyFat - (0.191)Weight + Height - (1.05e-04)Neck - (5.59e-04)Chest - (2.80e-04)Abdomen - (3.49e-04)Hip - (4.43e-05)Thigh + (8.72e-04)Biceps - (2.40e-03)Forearm + (6.27e-04)Wrist + (1.340)BMI |
| La_Neck | - (0.098)Weight + Neck - (1.58e-04)Abdomen + (0.171)Hip - (1.65e-03)Biceps - (0.021)Wrist - (0.088)BMI |
| La_Chest | - (0.141)Weight - (1.29e-03)Neck + Chest - (0.368)Abdomen + (0.444)Hip + (0.202)Thigh + (7.73e-03)Wrist - (1.043)BMI |
| La_Abdomen | - (0.102)Weight + Abdomen - (1.858)BMI |
| La_Hip | - (0.181)Weight + Hip - (0.427)BMI |
| La_Thigh | - (0.054)Weight - (3.87e-03)Neck + (2.06e-03)Chest + (0.129)Abdomen - (0.494)Hip + Thigh - (0.018)Biceps + (0.023)Wrist - (0.242)BMI |
| La_Knee | - (0.066)Weight + (0.042)Neck - (2.42e-04)Chest - (0.013)Abdomen + (0.017)Hip - (0.098)Thigh + Knee + (1.34e-03)Biceps - (0.265)Wrist + (0.133)BMI |
| La_Ankle | - (0.032)Weight + (0.094)Neck + (0.049)Abdomen - (8.82e-04)Hip + (1.28e-03)Thigh - (0.015)Knee + Ankle - (0.579)Wrist - (0.092)BMI |
| La_Biceps | - (0.078)Weight - (0.010)Neck + (0.050)Abdomen + (0.097)Hip - (0.199)Thigh + Biceps - (4.30e-03)Wrist - (0.097)BMI |
| La_Forearm | - (0.020)Weight + (6.49e-03)Neck + (4.69e-03)Abdomen - (4.83e-03)Hip + (0.021)Thigh - (0.317)Biceps + Forearm - (0.039)Wrist - (0.011)BMI |
| La_Wrist | - (0.013)Weight - (0.161)Neck + (1.91e-03)Hip + Wrist - (8.94e-04)BMI |
| La_BMI | - (0.110)Weight + BMI |
## The transformation
par(op)
transform <- attr(body_fat_Decorrelated,"UPLTM") != 0 # The non-zero coefficients
colnames(transform) <- str_remove_all(colnames(transform),"La_") # For network analysis
transform <- abs(transform*cor(body_fat[,rownames(transform)])) # The weights are proportional to the observed correlation
gplots::heatmap.2(transform,
trace = "none",
mar = c(5,5),
Rowv=NULL,
Colv="Rowv",
dendrogram="none",
col=rev(heat.colors(11)),
main = "(Transform <> 0)*Correlation",
cexRow = 0.75,
cexCol = 0.75,
srtCol=30,
srtRow=60,
key.title=NA,
key.xlab="|R|",
xlab="Output Feature", ylab="Input Feature")
par(op)
## Network analysis
# The vertex size will be proportional to the fscore of the IDeA procedure.
VertexSize <- attr(body_fat_Decorrelated,"fscore") # The size depends on the variable independence relevance (fscore)
VertexSize <- 10*(VertexSize-min(VertexSize))/(max(VertexSize)-min(VertexSize)) # Normalization
gr <- graph_from_adjacency_matrix(transform,mode = "directed",diag = FALSE,weighted=TRUE)
gr$layout <- layout_with_fr
fc <- cluster_optimal(gr)
plot(fc, gr,
edge.width=2*E(gr)$weight,
edge.arrow.size=0.5,
edge.arrow.width=0.5,
vertex.size=VertexSize,
vertex.label.cex=0.85,
vertex.label.dist=2,
main="Bootstrap: Feature Association")
par(op)
## Here we plot the final degree of correlation among output features
cormat <- cor(body_fat_Decorrelated,method="pearson")
gplots::heatmap.2(abs(cormat),
trace = "none",
mar = c(5,5),
col=rev(heat.colors(11)),
main = "Correlation After ILAA",
cexRow = 0.75,
cexCol = 0.75,
srtCol=30,
srtRow=60,
key.title=NA,
key.xlab="|Pearson Correlation|",
xlab="Feature", ylab="Feature")
par(op)
diag(cormat) <- 0
pander::pander(max(abs(cormat)))
0.163
The following code shows the association of the latent variable to
each one of the observed parent variable, and the association of the
parent variables to its linear model. For this example I will use the
"VarRatio" to rank the latent variables from the ones that
keep the most original variance to the latent variable with the minimum
fraction.
par(mfrow=c(1,2),cex=0.45)
fnames <- names(charFormulas)[1]
## Sort by explanined variace
varratio <- attr(body_fat_Decorrelated,"VarRatio")
varratio <- varratio[names(varratio) %in% names(charFormulas)]
varratio <- varratio[order(-varratio)]
## Ploting
for (fnames in names(varratio))
{
print(fnames)
obsname <- str_remove(fnames,"La_")
menv <- mean(body_fat_Decorrelated[,fnames])
range <- max(body_fat[,obsname])-min(body_fat[,obsname])
ylim <- c(menv-range/2,menv+range/2)
xvals <- c(min(body_fat[,obsname]),max(body_fat[,obsname]))
plot(body_fat[,obsname],
body_fat_Decorrelated[,fnames],
ylim=ylim,
ylab=fnames,
xlab=obsname,
main=paste("ILAA Latent Variable:",fnames))
lmtvals <- lm(body_fat_Decorrelated[,fnames]~body_fat[,obsname])
pred <- lmtvals$coefficients[1] + lmtvals$coefficients[2] * xvals
lines(x=xvals,y=pred,col="red")
text(xvals[1]+(xvals[2]-xvals[1])/2,0.95*(ylim[2]-ylim[1])+ylim[1],sprintf("Slope= %.2f",lmtvals$coefficients[2]))
deformula <- LatentFormulas[[fnames]]
noInames <- names(deformula)[names(deformula) != obsname]
predObs <- -(as.matrix(body_fat[,noInames]) %*% deformula[noInames])
xvals <- c(min(predObs),max(predObs))
plot(predObs,
body_fat[,obsname],
ylab=obsname,
xlab=paste("Model:",obsname),
main=paste("ILAA Generated Predictions of",obsname)
)
lmtvals <- lm(body_fat[,obsname]~predObs)
pred <- lmtvals$coefficients[1] + lmtvals$coefficients[2] * xvals
lines(x=xvals,y=pred,col="red")
ylim <- c(min(body_fat[,obsname]),max(body_fat[,obsname]))
text(xvals[1]+(xvals[2]-xvals[1])/2,0.95*(ylim[2]-ylim[1])+ylim[1],sprintf("Slope= %.2f",lmtvals$coefficients[2]))
}
[1] “La_Ankle”
[1]
“La_Forearm”
[1]
“La_Age”
[1]
“La_Wrist”
[1]
“La_Biceps”
[1]
“La_BodyFat”
[1]
“La_Neck”
[1]
“La_Knee”
[1]
“La_BMI”
[1]
“La_Thigh”
[1]
“La_Abdomen”
[1]
“La_Chest”
[1]
“La_Hip”
[1]
“La_Height”
par(op)
The visual inspection of the above-displayed figures shows that some latent variables are not associated with the original parent variable, but their model is fully correlated to the observed parent variable. A clear example is The last plot in the above figure.
The rerecorded use of ILAA transformation in supervised learning is to split the data into training and validation sets. Henceforth, the next lines of code will split the data into training (75%) and testing (25%)
# 75% for training 25% for testing
set.seed(2)
trainsamples <- sample(nrow(body_fat),3*nrow(body_fat)/4)
trainingset <- body_fat[trainsamples,]
testingset <- body_fat[-trainsamples,]
By default, ILAA() transforms are blind to outcome
associations. but in supervised learning the user is free to specify a
target outcome to drive the shape of the transformation matrix.
Outcome-driven transformations try to keep unaltered features strongly
associated with the target.
The predictDecorrelate() function can be used to predict
any new dataset from an ILAA transformed object.
The next code snippet shows the process of transforming the training set and then using the returned object to transform the testing set using both outcome-blind and outcome-driven transformations.
## Outcome-blind
body_fat_Decorrelated_train <- ILAA(trainingset,
thr=0.2,
Outcome="BodyFat",
verbose=TRUE)
fast | LM | Weight Age Weight Height Neck Chest Abdomen 0.07142857 1.00000000 0.14285714 0.50000000 0.78571429 0.71428571
Included: 14 , Uni p: 0.01071429 , Base Size: 1 , Rcrit: 0.1676986
1 <R=0.940,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.880
2 <R=0.880,thr=0.800>, Top: 1< 4 >Fa= 2,<|><>Tot Used: 8 , Added: 4 , Zero Std: 0 , Max Cor: 0.852
3 <R=0.852,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.969
4 <R=0.969,thr=0.950>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 9 , Added: 1 , Zero Std: 0 , Max Cor: 0.779
5 <R=0.779,thr=0.700>, Top: 1< 2 >Fa= 2,<|><>Tot Used: 11 , Added: 2 , Zero Std: 0 , Max Cor: 0.694
6 <R=0.694,thr=0.600>, Top: 1< 2 >Fa= 2,<|><>Tot Used: 13 , Added: 2 , Zero Std: 0 , Max Cor: 0.462
7 <R=0.462,thr=0.400>, Top: 2< 1 >Fa= 3,<|><>Tot Used: 13 , Added: 2 , Zero Std: 0 , Max Cor: 0.394
8 <R=0.394,thr=0.300>, Top: 4< 1 >Fa= 6,<|><>Tot Used: 13 , Added: 4 , Zero Std: 0 , Max Cor: 0.374
9 <R=0.374,thr=0.300>, Top: 4< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.351
10 <R=0.351,thr=0.300>, Top: 2< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.360
11 <R=0.360,thr=0.300>, Top: 2< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.355
12 <R=0.355,thr=0.300>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.294
13 <R=0.294,thr=0.200>, Top: 4< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.304
14 <R=0.304,thr=0.300>, Top: 1< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.254
15 <R=0.254,thr=0.200>, Top: 4< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.248
16 <R=0.248,thr=0.200>, Top: 2< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.243
17 <R=0.243,thr=0.200>, Top: 1< 1 >Fa= 10,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.198
18 <R=0.198,thr=0.200>
[ 18 ], 0.1979781 Decor Dimension: 14 Nused: 14 . Cor to Base: 13 , ABase: 14 , Outcome Base: 0
pander::pander(attr(body_fat_Decorrelated_train,"drivingFeatures"))
Weight, Hip, BMI, Chest, Abdomen, Thigh, Knee, Neck, Biceps, Wrist, Forearm, Ankle, Height and Age
body_fat_Decorrelated_test <- predictDecorrelate(body_fat_Decorrelated_train
,testingset)
## Outcome-driven transformation
body_fat_Decorrelated_trainD <- ILAA(trainingset,
thr=0.2,
Outcome="BodyFat",
drivingFeatures="BodyFat",
verbose=TRUE)
fast | LM | Abdomen BMI Chest Hip Weight Thigh 2.703526e-46 1.013764e-33 4.927811e-28 1.286637e-24 6.362965e-22 1.567169e-18
Abdomen Age Weight Height Neck Chest Abdomen 0.14285714 0.71428571 0.07142857 0.50000000 0.85714286 1.00000000
Included: 14 , Uni p: 0.01071429 , Base Size: 1 , Rcrit: 0.1676986
1 <R=0.940,thr=0.900>, Top: 2< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 3 , Zero Std: 0 , Max Cor: 0.880
2 <R=0.880,thr=0.800>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 5 , Added: 1 , Zero Std: 0 , Max Cor: 0.783
3 <R=0.783,thr=0.700>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 8 , Added: 3 , Zero Std: 0 , Max Cor: 0.719
4 <R=0.719,thr=0.700>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.695
5 <R=0.695,thr=0.600>, Top: 2< 1 >Fa= 3,<|><>Tot Used: 11 , Added: 3 , Zero Std: 0 , Max Cor: 0.544
6 <R=0.544,thr=0.500>, Top: 3< 2 >Fa= 5,<|><>Tot Used: 12 , Added: 4 , Zero Std: 0 , Max Cor: 0.469
7 <R=0.469,thr=0.400>, Top: 5< 1 >Fa= 5,<|><>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.594
8 <R=0.594,thr=0.500>, Top: 2< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.623
9 <R=0.623,thr=0.600>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.733
10 <R=0.733,thr=0.700>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.580
11 <R=0.580,thr=0.500>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.755
12 <R=0.755,thr=0.700>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.458
13 <R=0.458,thr=0.400>, Top: 2< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.530
14 <R=0.530,thr=0.500>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.420
15 <R=0.420,thr=0.400>, Top: 1< 1 >Fa= 6,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.400
16 <R=0.400,thr=0.300>, Top: 5< 2 >Fa= 8,<|><>Tot Used: 14 , Added: 6 , Zero Std: 0 , Max Cor: 0.346
17 <R=0.346,thr=0.300>, Top: 2< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.306
18 <R=0.306,thr=0.300>, Top: 2< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 2 , Zero Std: 0 , Max Cor: 0.292
19 <R=0.292,thr=0.200>, Top: 3< 2 >Fa= 8,<|><>Tot Used: 14 , Added: 4 , Zero Std: 0 , Max Cor: 0.301
20 <R=0.301,thr=0.300>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.300
21 <R=0.300,thr=0.200>, Top: 3< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 5 , Zero Std: 0 , Max Cor: 0.335
22 <R=0.335,thr=0.300>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.257
23 <R=0.257,thr=0.200>, Top: 2< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 3 , Zero Std: 0 , Max Cor: 0.239
24 <R=0.239,thr=0.200>, Top: 1< 1 >Fa= 9,<|><>Tot Used: 14 , Added: 1 , Zero Std: 0 , Max Cor: 0.200
25 <R=0.200,thr=0.200>
[ 25 ], 0.1999494 Decor Dimension: 14 Nused: 14 . Cor to Base: 13 , ABase: 14 , Outcome Base: 13
pander::pander(attr(body_fat_Decorrelated_trainD,"drivingFeatures"))
Abdomen, BMI, Chest, Hip, Weight, Thigh, Knee, Neck, Biceps, Forearm, Wrist, Ankle, Age and Height
body_fat_Decorrelated_testD <- predictDecorrelate(body_fat_Decorrelated_trainD
,testingset)
Once we have a transformed training and testing set, we can proceed
to train a linear model of the body fat content. For this example we
will use the LASSO_1SE() function of the FRESA.CAD package
to model the \(BodyFat\) using all the
variables in the transformed training set.
## Outcome-Blind
modelBodyFat <- LASSO_1SE(BodyFat~.,body_fat_Decorrelated_train)
pander::pander(as.matrix(modelBodyFat$coef))
| (Intercept) | -3.592 |
| Weight | 0.150 |
| La_Abdomen | 0.718 |
| La_BMI | 1.515 |
## Outcome-Driven
modelBodyFatD <- LASSO_1SE(BodyFat~.,body_fat_Decorrelated_trainD)
pander::pander(as.matrix(modelBodyFatD$coef))
| (Intercept) | -38.4288 |
| La_Weight | -0.0489 |
| La_Neck | -0.2434 |
| Abdomen | 0.6053 |
| La_Hip | -0.0883 |
The printed beta coefficients of the models show that the LASSO models are different between the Outcome-driven and outcome-blind ILAA methods.
Here we check the Variance inflation factor (VIF) on the train and testing sets
frm <- paste("BodyFat~",str_flatten(modelBodyFat$selectedfeatures," + "))
X <- model.matrix(formula(frm),body_fat_Decorrelated_train);
mc <- multiCol(X)
title("Train VIF")
vifd <- VIF(X)
vifx <-vif(lm(formula(frm),body_fat_Decorrelated_train))
X <- model.matrix(formula(frm),body_fat_Decorrelated_test);
mc <- multiCol(X)
title("Test VIF")
frm <- paste("BodyFat~",str_flatten(modelBodyFatD$selectedfeatures," + "))
X <- model.matrix(formula(frm),body_fat_Decorrelated_trainD);
mc <- multiCol(X)
title("Driven: Train VIF")
X <- model.matrix(formula(frm),body_fat_Decorrelated_testD);
mc <- multiCol(X)
title("Driven: Test VIF")
The plots clearly indicate that both models do not have colinearity issues
The FRESA.CAD package provides a handy function,
getObservedCoef()m to get the linear beta coefficients from
the transformed object. The next code shows the procedure.
# Get the coefficients in the observed space for the outcome-blind
observedCoef <- getObservedCoef(body_fat_Decorrelated_train,modelBodyFat)
pander::pander(as.matrix(observedCoef$coefficients))
| (Intercept) | -3.59177 |
| Weight | -0.00299 |
| Chest | -0.36091 |
| Abdomen | 0.71823 |
| Hip | -0.26029 |
| BMI | 0.75375 |
# The outcome-driven coefficients
observedCoefD <- getObservedCoef(body_fat_Decorrelated_trainD,modelBodyFatD)
pander::pander(as.matrix(observedCoefD$coefficients))
| (Intercept) | -38.4288 |
| Weight | -0.0489 |
| Neck | -0.1211 |
| Abdomen | 0.6835 |
| Hip | 0.0569 |
| BMI | 0.0789 |
Here we check the Variance inflation factor (VIF) on the train and testing sets using the observed variables
X <- model.matrix(formula(observedCoef$formula),trainingset);
mc <- multiCol(X)
title("Observed Training VIF")
X <- model.matrix(formula(observedCoef$formula),testingset);
mc <- multiCol(X)
title("Observed Testing VIF")
X <- model.matrix(formula(observedCoefD$formula),trainingset);
mc <- multiCol(X)
title("Driven: Observed Training VIF")
X <- model.matrix(formula(observedCoefD$formula),testingset);
mc <- multiCol(X)
title("Driven: Observed Testing VIF")
The results indicate that the models created using the observed variables have strong collinearity issues.
The user can predict the BodyFat content using the handy
predict() function. After that we can measure the testing
performance using the predictionStats_regression()
function.
## OUtcome-Blind
predicBodyFat <- predict(modelBodyFat,body_fat_Decorrelated_test)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
predicBodyFat),
"Body Fat: Blind")
Body Fat: Blind
pander::pander(rmetrics)
corci:
| cor | ||
|---|---|---|
| 0.811 | 0.706 | 0.882 |
biasci: -0.0537, -1.2272 and 1.1199
RMSEci: 4.66, 3.97 and 5.64
spearmanci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.818 | 0.699 | 0.895 |
MAEci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 3.71 | 3.08 | 4.44 |
pearson:
| Test statistic | df | P value | Alternative hypothesis | cor |
|---|---|---|---|---|
| 10.8 | 61 | 7.29e-16 * * * | two.sided | 0.811 |
## Outcome-Driven
predicBodyFatD <- predict(modelBodyFatD,body_fat_Decorrelated_testD)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
predicBodyFatD),
"Body Fat: Driven")
Body Fat: Driven
pander::pander(rmetrics)
corci:
| cor | ||
|---|---|---|
| 0.821 | 0.72 | 0.888 |
biasci: 0.184, -0.967 and 1.335
RMSEci: 4.57, 3.90 and 5.54
spearmanci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.832 | 0.716 | 0.904 |
MAEci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 3.52 | 2.83 | 4.29 |
pearson:
| Test statistic | df | P value | Alternative hypothesis | cor |
|---|---|---|---|---|
| 11.2 | 61 | 1.68e-16 * * * | two.sided | 0.821 |
The reported metrics indicated that the model predictions are highly correlated to the real \(BodyFat\)
An ILAA user has the option to predict the \(BodyFat\) content from the observed testing
set using the computed beta coefficients. The next lines of code show
how to do the prediction using model.matrix() R function
and the dot product %*% :
predicBodyFatObst <- model.matrix(formula(observedCoef$formula),testingset) %*% observedCoef$coefficients
plot(predicBodyFatObst,
predicBodyFat,
xlab="Observed Space",
ylab="Transformed Space",
main="Test Predictions: Observed vs. Transformed")
The last plot shows the expected result: that both predictions are identical.
A last experiment is to compare the differences between a LASSO model created from the observed features to the model created from the transformed observations.
The next lines of code compute the linear model using LASSO from the original observed data. Then, it computes the predicted performance.
rawmodelBodyFat <- LASSO_1SE(BodyFat~.,trainingset)
pander::pander(rawmodelBodyFat$coef)
| (Intercept) | Height | Abdomen |
|---|---|---|
| -23.2 | -0.189 | 0.601 |
rawpredicBodyFat <- predict(rawmodelBodyFat,testingset)
rmetrics <- predictionStats_regression(cbind(testingset$BodyFat,
rawpredicBodyFat),"Body Fat")
Body Fat
pander::pander(rmetrics)
corci:
| cor | ||
|---|---|---|
| 0.808 | 0.701 | 0.88 |
biasci: 0.169, -1.020 and 1.358
RMSEci: 4.72, 4.02 and 5.72
spearmanci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.813 | 0.687 | 0.894 |
MAEci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 3.66 | 3 | 4.43 |
pearson:
| Test statistic | df | P value | Alternative hypothesis | cor |
|---|---|---|---|---|
| 10.7 | 61 | 1.13e-15 * * * | two.sided | 0.808 |
The evaluation of the testing results indicates that the observed model predictions have a correlation of 0.875. Slightly superior, but not statistically significant, to the one observed from the model estimated from the transformed space: ( \(\rho _t=0.863\) vs. \(\rho _o=0.875\) )
The main advantage of the ILAA transformation is that the returned
latent variables are not colinear hence the statistical significance of
the beta coefficients are not affected by multicolinearity. The next
code snippet shows how to get the beta coefficients using the
lm() , and summary.lm() functions.
The inspection of the summary results clearly shows that most of the beta coefficients on the transformed data set are significant.
## Raw Model
par(mfrow=c(2,2),cex=0.5)
rawlm <- lm(BodyFat~.,
trainingset[,c("BodyFat",names(rawmodelBodyFat$coef)[-1])])
pander::pander(rawlm,add.significance.stars=TRUE)
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -4.733 | 9.2028 | -0.514 | 6.08e-01 | |
| Height | -0.580 | 0.1324 | -4.384 | 1.95e-05 | * * * |
| Abdomen | 0.699 | 0.0331 | 21.117 | 3.61e-51 | * * * |
plot(rawlm)
## Outcome-Blind
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(BodyFat~.,body_fat_Decorrelated_train[,c("BodyFat",names(modelBodyFat$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -7.941 | 3.2000 | -2.48 | 1.40e-02 | * |
| Weight | 0.180 | 0.0122 | 14.79 | 3.95e-33 | * * * |
| La_Abdomen | 0.952 | 0.0956 | 9.96 | 5.91e-19 | * * * |
| La_BMI | 2.079 | 0.2065 | 10.07 | 2.88e-19 | * * * |
plot(Delm)
## Outcome-Driven
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(BodyFat~.,
body_fat_Decorrelated_trainD[,c("BodyFat",names(modelBodyFatD$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -44.155 | 12.8683 | -3.43 | 7.43e-04 | * * * |
| La_Weight | -0.131 | 0.0442 | -2.96 | 3.49e-03 | * * |
| La_Neck | -0.655 | 0.2221 | -2.95 | 3.59e-03 | * * |
| Abdomen | 0.671 | 0.0320 | 21.00 | 1.29e-50 | * * * |
| La_Hip | -0.297 | 0.1019 | -2.92 | 3.98e-03 | * * |
plot(Delm)
par(op)
This last experiment showcases the effect of data transformation on logistic modeling. This experiment starts by creating a data-frame that does not includes the \(BMI\), \(Height\), and \(Weight\) variables. The target outcome is to identify if the person is Overweight or normal. (BMI>=25). The next lines of code compute the new data frames and remove the above mentioned variables.
First Remove Height and Weight from Training and Testing Sets
trainingsetBMI <- trainingset[,!(colnames(trainingset) %in% c("Weight","Height"))]
testingsetBMI <- testingset[,!(colnames(trainingset) %in% c("Weight","Height"))]
trainingsetBMI$Overweight <- 1*(trainingsetBMI$BMI>=25)
testingsetBMI$Overweight <- 1*(testingsetBMI$BMI>=25)
trainingsetBMI$BMI <- NULL
testingsetBMI$BMI <- NULL
# The number of subjects
pander::pander(table(trainingsetBMI$Overweight))
| 0 | 1 |
|---|---|
| 96 | 92 |
pander::pander(table(testingsetBMI$Overweight))
| 0 | 1 |
|---|---|
| 29 | 34 |
## The outcome-blind transformation
OW_Decorrelated_train <- ILAA(trainingsetBMI,
thr=0.2,
Outcome="Overweight",
verbose=TRUE)
fast | LM | Chest BodyFat Age Neck Chest Abdomen Hip 0.58333333 0.08333333 0.50000000 1.00000000 0.91666667 0.83333333
Included: 12 , Uni p: 0.0125 , Base Size: 1 , Rcrit: 0.1634602
1 <R=0.917,thr=0.900>, Top: 1< 1 >Fa= 1,<|><>Tot Used: 2 , Added: 1 , Zero Std: 0 , Max Cor: 0.880
2 <R=0.880,thr=0.800>, Top: 1< 1 >Fa= 1,<|><>Tot Used: 3 , Added: 1 , Zero Std: 0 , Max Cor: 0.783
3 <R=0.783,thr=0.700>, Top: 1< 4 >Fa= 1,<|><>Tot Used: 7 , Added: 4 , Zero Std: 0 , Max Cor: 0.706
4 <R=0.706,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 7 , Added: 1 , Zero Std: 0 , Max Cor: 0.697
5 <R=0.697,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 10 , Added: 3 , Zero Std: 0 , Max Cor: 0.640
6 <R=0.640,thr=0.600>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.567
7 <R=0.567,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.497
8 <R=0.497,thr=0.400>, Top: 4< 1 >Fa= 5,<|><>Tot Used: 11 , Added: 4 , Zero Std: 0 , Max Cor: 0.425
9 <R=0.425,thr=0.400>, Top: 1< 1 >Fa= 5,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.381
10 <R=0.381,thr=0.300>, Top: 4< 1 >Fa= 6,<|><>Tot Used: 12 , Added: 3 , Zero Std: 0 , Max Cor: 0.382
11 <R=0.382,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.295
12 <R=0.295,thr=0.200>, Top: 4< 2 >Fa= 7,<|><>Tot Used: 12 , Added: 5 , Zero Std: 0 , Max Cor: 0.299
13 <R=0.299,thr=0.200>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 12 , Added: 2 , Zero Std: 0 , Max Cor: 0.257
14 <R=0.257,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.203
15 <R=0.203,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.189
16 <R=0.189,thr=0.200>
[ 16 ], 0.1894651 Decor Dimension: 12 Nused: 12 . Cor to Base: 11 , ABase: 12 , Outcome Base: 0
OW_Decorrelated_test <- predictDecorrelate(OW_Decorrelated_train,testingsetBMI)
## The outcome-driven transformation
OW_Decorrelated_trainD <- ILAA(trainingsetBMI,
thr=0.2,
Outcome="Overweight",
drivingFeatures="Overweight",
verbose=TRUE)
fast | LM | Chest Abdomen Hip Neck Thigh Biceps 6.940373e-28 9.802035e-28 5.772864e-23 9.353353e-19 1.810521e-18 1.810521e-18
Chest BodyFat Age Neck Chest Abdomen Hip 0.50000000 0.08333333 0.75000000 1.00000000 0.91666667 0.83333333
Included: 12 , Uni p: 0.0125 , Base Size: 1 , Rcrit: 0.1634602
1 <R=0.917,thr=0.900>, Top: 1< 1 >Fa= 1,<|><>Tot Used: 2 , Added: 1 , Zero Std: 0 , Max Cor: 0.880
2 <R=0.880,thr=0.800>, Top: 1< 1 >Fa= 1,<|><>Tot Used: 3 , Added: 1 , Zero Std: 0 , Max Cor: 0.783
3 <R=0.783,thr=0.700>, Top: 1< 4 >Fa= 1,<|><>Tot Used: 7 , Added: 4 , Zero Std: 0 , Max Cor: 0.706
4 <R=0.706,thr=0.700>, Top: 1< 1 >Fa= 2,<|><>Tot Used: 7 , Added: 1 , Zero Std: 0 , Max Cor: 0.697
5 <R=0.697,thr=0.600>, Top: 1< 3 >Fa= 2,<|><>Tot Used: 10 , Added: 3 , Zero Std: 0 , Max Cor: 0.640
6 <R=0.640,thr=0.600>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.567
7 <R=0.567,thr=0.500>, Top: 1< 1 >Fa= 3,<|><>Tot Used: 10 , Added: 1 , Zero Std: 0 , Max Cor: 0.497
8 <R=0.497,thr=0.400>, Top: 4< 1 >Fa= 5,<|><>Tot Used: 11 , Added: 4 , Zero Std: 0 , Max Cor: 0.425
9 <R=0.425,thr=0.400>, Top: 1< 1 >Fa= 5,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.381
10 <R=0.381,thr=0.300>, Top: 4< 1 >Fa= 6,<|><>Tot Used: 12 , Added: 3 , Zero Std: 0 , Max Cor: 0.382
11 <R=0.382,thr=0.300>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.295
12 <R=0.295,thr=0.200>, Top: 4< 2 >Fa= 7,<|><>Tot Used: 12 , Added: 5 , Zero Std: 0 , Max Cor: 0.299
13 <R=0.299,thr=0.200>, Top: 2< 1 >Fa= 7,<|><>Tot Used: 12 , Added: 2 , Zero Std: 0 , Max Cor: 0.257
14 <R=0.257,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.203
15 <R=0.203,thr=0.200>, Top: 1< 1 >Fa= 8,<|><>Tot Used: 12 , Added: 1 , Zero Std: 0 , Max Cor: 0.189
16 <R=0.189,thr=0.200>
[ 16 ], 0.1894651 Decor Dimension: 12 Nused: 12 . Cor to Base: 11 , ABase: 12 , Outcome Base: 12
OW_Decorrelated_testD <- predictDecorrelate(OW_Decorrelated_trainD,testingsetBMI)
The last code snippet transforms the observed features using ILLA and setting a target variable and setting the convergence not to be affected by the target outcome.
LASSO_1SE with a binomial family is used to compute the logistic model of overweight.
## Outcome-blind
modelOverweight <- LASSO_1SE(Overweight~.,
OW_Decorrelated_train,
family="binomial")
pander::pander(as.matrix(modelOverweight$coef))
| (Intercept) | -36.9601 |
| Chest | 0.3877 |
| La_Abdomen | 0.0717 |
| La_Hip | 0.0351 |
## Outcome-driven
modelOverweightD <- LASSO_1SE(Overweight~.,
OW_Decorrelated_trainD,
family="binomial")
pander::pander(as.matrix(modelOverweightD$coef))
| (Intercept) | -40.3810 |
| Chest | 0.4225 |
| La_Abdomen | 0.0886 |
| La_Hip | 0.0561 |
Once the logistic model is created in the transformed space, we can compute the beta coefficients for each one of the observed variables.
# Get the coefficients in the observed space
observedCoef <- getObservedCoef(OW_Decorrelated_train,modelOverweight)
pander::pander(as.matrix(observedCoef$coefficients))
| (Intercept) | -36.9601 |
| Chest | 0.3070 |
| Abdomen | 0.0717 |
| Hip | -0.0031 |
The predictions of the testing set can be done using the handy
predict() function. The evaluation of the testing results
can be evaluated using the predictionStats_binary()
function.
## Outcome-blind
predicOverweight <- predict(modelOverweight,OW_Decorrelated_test)
pr <- predictionStats_binary(cbind(OW_Decorrelated_test$Overweight,
predicOverweight),"Overweight: Blind")
pander::pander(pr$ClassMetrics)
accci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.841 | 0.746 | 0.921 |
senci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.841 | 0.75 | 0.921 |
aucci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.841 | 0.75 | 0.921 |
berci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.159 | 0.0788 | 0.25 |
preci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.839 | 0.744 | 0.921 |
F1ci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.838 | 0.744 | 0.92 |
## Outcome-Driven
predicOverweightD <- predict(modelOverweightD,OW_Decorrelated_testD)
pr <- predictionStats_binary(cbind(OW_Decorrelated_test$Overweight,
predicOverweightD),"Overweight: Driven")
pander::pander(pr$ClassMetrics)
accci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.841 | 0.746 | 0.921 |
senci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.843 | 0.75 | 0.927 |
aucci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.843 | 0.75 | 0.927 |
berci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.157 | 0.0732 | 0.25 |
preci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.842 | 0.747 | 0.923 |
F1ci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.841 | 0.744 | 0.921 |
The predict of the testing set can be done using the
model.matrix() and the dot product %*%.
predicOverweightObst <- model.matrix(formula(observedCoef$formula),testingsetBMI) %*% observedCoef$coefficients
#predicOverweightObst <- 1.0/(1.0 + exp(-predicOverweightObst));
plot(predicOverweightObst,predicOverweight,
xlab="Observed",
ylab="Transformed",
main="Test predictions: Observed vs. Transformed")
The last plot shows the expected result: both predictions are identical.
To showcase the advantage of transformed modeling vs. raw modeling, here I’ll estimate the logistic model from the observed variables and contrast to the model generated from the transformed space.
The next lines of code compute the logistic model and display its testing performance:
##Training
rawmodelOverweight <- LASSO_1SE(Overweight~.,
trainingsetBMI,
family="binomial")
pander::pander(rawmodelOverweight$coef)
| (Intercept) | BodyFat | Chest | Abdomen | Thigh | Ankle | Biceps |
|---|---|---|---|---|---|---|
| -39.9 | 0.0108 | 0.206 | 0.147 | 0.0275 | 0.064 | 0.0818 |
## Predict
rawpredicOverweight <- predict(rawmodelOverweight,testingsetBMI)
pr <- predictionStats_binary(cbind(testingsetBMI$Overweight,
rawpredicOverweight),"Overweight")
pander::pander(pr$ClassMetrics)
accci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.873 | 0.778 | 0.952 |
senci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.873 | 0.779 | 0.948 |
aucci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.873 | 0.779 | 0.948 |
berci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.127 | 0.052 | 0.221 |
preci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.877 | 0.788 | 0.952 |
F1ci:
| 50% | 2.5% | 97.5% |
|---|---|---|
| 0.872 | 0.777 | 0.95 |
The model created from the observed data has an ROC AUC that is not statistically significant to the transformed model
This last lines of code will compute the significance of the beta coefficients for both the observed model and the latent-based model. The user can clearly see that all the betas of the latent-based model are statically significant. An effect that is not seen in the logistic observed model.
par(mfrow=c(2,2),cex=0.5)
## Raw model
rawlm <- lm(Overweight~.,trainingsetBMI[,c("Overweight",names(rawmodelOverweight$coef)[-1])])
pander::pander(rawlm,add.significance.stars=TRUE)
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -3.96182 | 0.40972 | -9.669 | 4.30e-18 | * * * |
| BodyFat | 0.00588 | 0.00492 | 1.194 | 2.34e-01 | |
| Chest | 0.01521 | 0.00781 | 1.947 | 5.31e-02 | |
| Abdomen | 0.01486 | 0.00743 | 2.000 | 4.70e-02 | * |
| Thigh | -0.00120 | 0.00807 | -0.149 | 8.82e-01 | |
| Ankle | 0.02388 | 0.01761 | 1.356 | 1.77e-01 | |
| Biceps | 0.03003 | 0.01229 | 2.444 | 1.55e-02 | * |
plot(rawlm)
## Outcome-blind
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(Overweight~.,OW_Decorrelated_test[,c("Overweight",names(modelOverweight$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -2.16781 | 0.83956 | -2.582 | 1.23e-02 | * |
| Chest | 0.03524 | 0.00548 | 6.434 | 2.44e-08 | * * * |
| La_Abdomen | 0.01911 | 0.01274 | 1.500 | 1.39e-01 | |
| La_Hip | -0.00373 | 0.00997 | -0.374 | 7.10e-01 |
plot(Delm)
## Outcome-Driven
par(mfrow=c(2,2),cex=0.5)
Delm <- lm(Overweight~.,OW_Decorrelated_testD[,c("Overweight",names(modelOverweightD$coef)[-1])])
pander::pander(Delm,add.significance.stars=TRUE)
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -2.16781 | 0.83956 | -2.582 | 1.23e-02 | * |
| Chest | 0.03524 | 0.00548 | 6.434 | 2.44e-08 | * * * |
| La_Abdomen | 0.01911 | 0.01274 | 1.500 | 1.39e-01 | |
| La_Hip | -0.00373 | 0.00997 | -0.374 | 7.10e-01 |
plot(Delm)
In conclusion, ILAA (Iterative Linear Association Analysis), stands as an unsupervised computer-based methodology adept at estimating linear transformation matrices. These matrices enable the conversion of datasets into a fresh latent-based space, offering a user-controlled degree of correlation. This report has effectively demonstrated the practical application of ILAA, providing comprehensive insights into its functions for estimating, predicting, and scrutinizing transformations. Such capabilities hold significant promise in supervised learning scenarios, encompassing regression and logistic models.